-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CountIn() does not assume the requests has 2 or more pods count #4355
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-kueue ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
I'm not sure why we need special handling for the ResourcePods in this function since both Requests and capacity contain the constraint. So it can be considered as any other resource.
Also, I think the semantics of unlimited Pods count capacity is not what kube-scheduler does. Can you double check? By my brief look earlier I thought kube-scheduler assumes 0 in that case.
If there is something confusing about the function maybe it can be clarified by a comment?
The I think we have 3 options to stabilize this function for
@mimowo Either approach is valid, IMO. Let me know what you think. |
TBH, I'm not sure what is the problem we are trying to solve, and the modification to the function makes it more complex, so it risks subtle bugs, so I'm cautious. Let me ask some questions:
If we need to choose one of the semantics I would definetly be in favor of following kube-scheduler meaning handling it as 0, but again, I'm not sure this function needs to be aware of the Pods. For example, other functions in the package aren't aware of Pods. |
/assign |
That makes sense. In that case, let me align with kube-scheduler and add edge handling when the requests are multiple Pods. Let's say the current request (receiver) is EDIT: I refined the example to clarify the problem |
Can you elaborate why the result would be invalid if the receiver has ResourcePods=2? Assuming indeed the result is wrong, could we just return error / panic rather than support? We can extend the support if needed I think, |
We can consider this case
I'm ok with a return error if requests (receiver) have two or more pods count. My point is just want to remove the unhandling case. Is this ok with a return error in the case of situation (pods requests is more than 2)? |
I thought the current would be 1, no? We basically could how many pods with the requests will fit in the capacity. So there will be one po which requests 3 pods inside 4 pods of capacity. Yeah, this gets tricky. I'm ok to just return an error whenever the value is different than 1, yes. |
We always calculate
EDIT: I fixed the example case. |
Signed-off-by: Yuki Iwai <[email protected]>
2843808
to
fad17dd
Compare
Yes, the capacity can accommodate 3 pods, but when I was introducing this function its aim was to be just a math helper function computing |
That makes sense for the objective. |
I turned this PR to return an error when requests have 2 or more Pods |
/retitle CountIn() does not assume the requests has 2 or more pods count |
@tenzen-y: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/hold EDIT: the semantics would be as it is currently implemented. I think this makes some sense. Think of the receiver as the PodGroup, it requests 3 pods. The total capacity is 4 pods. So the question is "how many groups of pods can I fit into the capacity. The answer is 1, so I think this is correct. Maybe some better comment is just needed. |
That makes sense (I saw older comments when I leave my above comments). |
In any case, I think that this PR is not needed. |
Maybe rename the receiver or add docs instead? I think the "mental model" of PodGroupRequests is quite useful to have in mind, basically "count how many pod groups like this can fit into the capacity" is the question answered by the function |
Interesting, I didn't expect that, do you have more details, is this expected? |
Yeah, If you prefer to keep those work in this PR, I'm fine with that. I was just assumed to work on it in a separate PR. |
Oh, it might be a bug... As I added logger ( kueue/pkg/cache/tas_flavor_snapshot.go Lines 535 to 537 in fad17dd
I got
|
Ah, I found the root cause. This is a bug since |
I feel like I recently fix bugs every week... |
Not good, because we cherry-picked the "fix", should we consider 0.10.2 buggy? |
separate is fine, for sure |
Technically, yes. But it happens only when TAS with Preemption since only preemption calls |
Ok, so it is ok from user PoV, we only release preemptions in 0.11 |
Yeah, we do not need to rush preparing next patch release. |
Thanks. So I will extract 2 PRs from this PR, 1 is a struct name change, 1 is a bug fix. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
I refined the
CountIn()
in the resources package since this does not consider the Pods count despite this being exposed to the external package.Additionally, I replaced
req
receiver withr
to use the same receiver name all forRequests
receivers.As we discussed in this PR, we decided not to handle that requests have 2 or more Pods.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
This is a follow-up PR of #4271.
#4271 (comment)
Does this PR introduce a user-facing change?